Oncolist Server API Examples

Author: Guorong Xu

2016-09-19

The notebook is an example that tells you how to calculate correlation, annotate gene clusters and generate JSON files on AWS.

Notice: Please open the notebook under /notebooks/BasicCFNClusterSetup.ipynb to install CFNCluster package on your Jupyter-notebook server before running the notebook.

1. Configure AWS key pair, data location on S3 and the project information


In [ ]:
import os
import sys

sys.path.append(os.getcwd().replace("notebooks", "cfncluster"))

## S3 input and output address.
s3_input_files_address = "s3://path/to/input folder"
s3_output_files_address = "s3://path/to/output folder"

## CFNCluster name
your_cluster_name = "cluster_name"

## The private key pair for accessing cluster.
private_key = "/path/to/private_key.pem"

## If delete cfncluster after job is done.
delete_cfncluster = False

Notice:

The file name of the expression file should follow the rule if you want to annotate correct in the output JSON file: "GSE number_Author name_Disease name_Number of Arrays_Institue name.txt".

For example: GSE65216_Maire_Breast_Tumor_159_Arrays_Paris.txt

2. Create CFNCluster

Notice: The CFNCluster package can be only installed on Linux box which supports pip installation.


In [ ]:
import CFNClusterManager, ConnectionManager

## Create a new cluster
master_ip_address = CFNClusterManager.create_cfn_cluster(cluster_name=your_cluster_name)
ssh_client = ConnectionManager.connect_master(hostname=master_ip_address,
               username="ec2-user",
               private_key_file=private_key)

After you verified the project information, you can execute the pipeline. When the job is done, you will see the log infomration returned from the cluster.

Checking the disease names


In [ ]:
import PipelineManager

## You can call this function to check the disease names included in the annotation.
PipelineManager.check_disease_name()

## Define the disease name from the below list of disease names.
disease_name = "BreastCancer"

Run the pipeline with the specific operation.


In [ ]:
import PipelineManager
    
## define operation
## calculate: calculate correlation;"
## oslom_cluster: clustering the gene moudules;"
## print_oslom_cluster_json: print json files;"
## all: run all operations;"

operation = "all" 

## run the pipeline
PipelineManager.run_analysis(ssh_client, disease_name, operation, s3_input_files_address, s3_output_files_address)

To check the processing status


In [ ]:
import PipelineManager

PipelineManager.check_processing_status(ssh_client)

To delete the cluster, you just need to set the cluster name and call the below function.


In [ ]:
import CFNClusterManager

if delete_cfncluster == True:
    CFNClusterManager.delete_cfn_cluster(cluster_name=your_cluster_name)

In [ ]: